Improving Word Sense Discrimination with Gloss Augmented Feature Vectors

نویسندگان

  • Amruta Purandare
  • Ted Pedersen
چکیده

This paper presents a method of unsupervised word sense discrimination that augments co–occurrence feature vectors derived from raw untagged corpora with information from the glosses found in a machine readable dictionary. Each content word that occurs in the context of a target word to be discriminated is represented by a co-occurrence feature vector. Each of these vectors is augmented with the content words that occur in the glosses of the different possible meanings of the word it represents. Then these vectors are averaged to create a vector that represents that context of the target word. Discrimination is carried out by clustering all of the vectors associated with the contexts in which the target word occurs. We show via an evaluation with the Senseval-2, line, hard and serve corpora that feature vectors augmented with gloss information from WordNet significantly improve discrimination performance when limited data is available.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Word Sense Discrimination by Clustering Contexts in Vector and Similarity Spaces

This paper systematically compares unsupervised word sense discrimination techniques that cluster instances of a target word that occur in raw text using both vector and similarity spaces. The context of each instance is represented as a vector in a high dimensional feature space. Discrimination is achieved by clustering these context vectors directly in vector space and also by finding pairwis...

متن کامل

Using WordNet-Based Context Vectors To Estimate The Semantic Relatedness Of Concepts

In this paper, we introduce a WordNetbased measure of semantic relatedness by combining the structure and content of WordNet with co–occurrence information derived from raw text. We use the co–occurrence information along with the WordNet definitions to build gloss vectors corresponding to each concept in WordNet. Numeric scores of relatedness are assigned to a pair of concepts by measuring the...

متن کامل

The generation of representations of word meanings from dictionaries

This paper describes the generation of iconic and categorical representations of word meaning, in propositional form, from the WordNet lexical database. These are derived from the list of synonyms, the descriptive gloss, and from the hypernym and meronym relations of each WordNet word sense. We demonstrate that these representations promote identification and discrimination, these being suggest...

متن کامل

Improving Distributed Representation of Word Sense via WordNet Gloss Composition and Context Clustering

In recent years, there has been an increasing interest in learning a distributed representation of word sense. Traditional context clustering based models usually require careful tuning of model parameters, and typically perform worse on infrequent word senses. This paper presents a novel approach which addresses these limitations by first initializing the word sense embeddings through learning...

متن کامل

Word Sense Discrimination Using Context Vector Similarity

This paper presents the application of context vector similarity for the purpose of word sense discrimination during query translation. The random indexing vector space method is used to accumulate the context vectors. Pair wise similarity of the context vectors of ambiguous terms with that of anchor terms indicated the possible correct translation of a query term. Two retrieval experiments wer...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004